Universal Kriging (UK)#

Universal Kriging (UK) is a variant of the Ordinary Kriging under non-stationary condition where mean differ in a deterministic way in different locations (local trend or drift), while only the variance is constant. This second-order stationarity (“weak stationarity”) is often a pertinent assumption with environmental exposures. In UK, usually first trend is calculated as a function of the coordinates and then the variation in what is left over (the residuals) as a random field is added to trend for making final prediction.

\[\begin{split}\begin{aligned} Z\left(s_{i}\right) &=m\left(s_{i}\right)+e\left(s_{i}\right) \\ Z(\vec{x}) &=\sum_{k=0}^{K} \beta_{k} f_{k}(\vec{x})+\varepsilon(\vec{x}) \end{aligned}\end{split}\]
  • Where the \(f_{k}\) are some global functions of position \(\vec{x}\) and the \(\beta_{k}\) are the coefficients.

  • The \(f\) are called base functions. The \(\varepsilon(\vec{x})\) is the spatially-correlated error, which is modelled as before, with a variogram, but now only considering the residuals, after the global trend is removed.

Load python modules

[1]:
import context
import salem
import numpy as np
import pandas as pd
import xarray as xr
import geopandas as gpd
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
from pykrige.uk import UniversalKriging

import plotly.express as px
from datetime import datetime

from utils.utils import pixel2poly, plotvariogram
from context import data_dir
******************************
context imported. Front of path:
/Users/rodell/krige-smoke
/Users/rodell/krige-smoke/docs/source
******************************

through /Users/rodell/krige-smoke/docs/source/context.py -- pha

Load Data#

Open the reformated data with the linear, meter-based Lambert projection (EPSG:3347). - Again this is helpful as lat/lon coordinates are less suitable for measuring distances which is important for spatial interpolation.

[2]:
df = pd.read_csv(str(data_dir) + "/obs/gpm25.csv")
gpm25 = gpd.GeoDataFrame(
    df,
    crs="EPSG:4326",
    geometry=gpd.points_from_xy(df["lon"], df["lat"]),
).to_crs("EPSG:3347")
gpm25["Easting"], gpm25["Northing"] = gpm25.geometry.x, gpm25.geometry.y
gpm25.head()
[2]:
Unnamed: 0 id datetime lat lon PM2.5 geometry Easting Northing
0 2 42073 2021-07-16 22:00:00 47.185173 -122.176855 0.862 POINT (3972238.901 1767531.888) 3.972239e+06 1.767532e+06
1 3 53069 2021-07-16 22:00:00 47.190197 -122.177992 1.078 POINT (3972419.863 1768071.699) 3.972420e+06 1.768072e+06
2 12 10808 2021-07-16 22:00:00 40.507316 -111.899188 9.780 POINT (4460286.051 743178.640) 4.460286e+06 7.431786e+05
3 16 85391 2021-07-16 22:00:00 48.454213 -123.283643 0.874 POINT (3964698.001 1931774.531) 3.964698e+06 1.931775e+06
4 21 79095 2021-07-16 22:00:00 47.672130 -122.514183 0.784 POINT (3974631.739 1827689.201) 3.974632e+06 1.827689e+06

Create Grid#

Again we will create a grid that we want to use for the interpolation.

  • The grid in the fromate of a dataset is helpful for reprojecting our covariates to match the interpolated grid.

[3]:
## define the desired  grid resolution in meters
resolution = 20_000  # grid cell size in meters

## make grid based on dataset bounds and resolution
gridx = np.arange(
    gpm25.bounds.minx.min() - resolution,
    gpm25.bounds.maxx.max() + resolution,
    resolution,
)
gridy = np.arange(
    gpm25.bounds.miny.min() - resolution,
    gpm25.bounds.maxy.max() + resolution,
    resolution,
)

## use salem to create a dataset with the grid.
krig_ds = salem.Grid(
    nxny=(len(gridx), len(gridy)),
    dxdy=(resolution, resolution),
    x0y0=(gpm25.bounds.minx.min(), gpm25.bounds.miny.min()),
    proj="epsg:3347",
    pixel_ref="corner",
).to_dataset()
## print dataset
krig_ds
[3]:
<xarray.Dataset>
Dimensions:  (x: 147, y: 124)
Coordinates:
  * x        (x) float64 3.46e+06 3.48e+06 3.5e+06 ... 6.36e+06 6.38e+06
  * y        (y) float64 4.984e+05 5.184e+05 5.384e+05 ... 2.938e+06 2.958e+06
Data variables:
    *empty*
Attributes:
    pyproj_srs:  +proj=lcc +lat_0=63.390675 +lon_0=-91.8666666666667 +lat_1=4...

Covariate#

We will use aersoal optical depth (AOD) as a covariate for universal kriging with specified drift. The data is from the modis aqua satellite during the datetime of interest

[4]:
aod_aqua_ds = salem.open_xr_dataset(str(data_dir) + f"/MYD04.2021197.G10.nc")

Set up specified drift#

For specified we need satellite derived aod at every aq monitor location and AOD on the same grid we are interpolating.

AOD at AQs location#

[5]:

y = xr.DataArray( np.array(df["lat"]), dims="ids", coords=dict(ids=df.id.values), ) x = xr.DataArray( np.array(df["lon"]), dims="ids", coords=dict(ids=df.id.values), ) var_points = aod_aqua_ds["AOD_550_GF_SM"].interp( Longitude=x, Latitude=y, method="linear" ) # print(var_points) if len(df.index) == len(var_points.values): var_points = var_points.values else: raise ValueError("Lenghts dont match")

AOD Data on grid#

Now we will transform the aod data to be on the grid we are interpolating too. This is feed in as a specified drift array when executing the interpolation.

[6]:
aod_aqua_ds_T = krig_ds.salem.transform(aod_aqua_ds)
var_array = aod_aqua_ds_T["AOD_550_GF_SM"].values

Plot AOD#

[7]:
ax = plt.axes(projection=ccrs.Orthographic(-80, 35))
ax.set_global()
aod_aqua_ds["AOD_550_GF_SM"].plot(ax=ax, transform=ccrs.PlateCarree())
ax.coastlines()
ax.set_extent([-132, -85, 35, 65], crs=ccrs.PlateCarree())
_images/comps-uk-aod_15_0.png

Setup UK#

[8]:
nlags = 15
variogram_model = "spherical"


startTime = datetime.now()
krig = UniversalKriging(
    x=gpm25["Easting"],  ## x location of aq monitors in lambert conformal
    y=gpm25["Northing"],  ## y location of aq monitors in lambert conformal
    z=gpm25["PM2.5"],  ## measured PM 2.5 concentrations at locations
    drift_terms=["specified"],
    variogram_model=variogram_model,
    nlags=nlags,
    specified_drift=[var_points],  ## aod at aq monitors
)
print(f"UK build time {datetime.now() - startTime}")
UK build time 0:02:36.174604

PyKrige will optimize most parameters based on user defined empirical model and the number of bins.

  • I tested several empirical models and bin sizes and found (for this case study) that a spherical model with 15 bins was optimal based on the output statics.

  • The literature supports spherical for geospatial interpolation applications over other methods.

[9]:
plotvariogram(krig)
_images/comps-uk-aod_19_0.png

Execute UK#

Interpolate data to our grid using UK with specified drift. Where the specified drift is the linear correlation of AOD to PM2.5 at all locations and on the interploated grid for kriging.

[10]:
startTime = datetime.now()
z, ss = krig.execute("grid", gridx, gridy, specified_drift_arrays=[var_array])
print(f"UK execution time {datetime.now() - startTime}")
UK_pm25 = np.where(z < 0, 0, z)

krig_ds["UK_pm25"] = (("y", "x"), UK_pm25)
UK execution time 0:00:06.199120

Plot UK#

Convert data to polygons to be plot-able on a slippy mapbox. This is not necessary but but :)

[11]:
polygons, values = pixel2poly(gridx, gridy, UK_pm25, resolution)
pm25_model = gpd.GeoDataFrame(
    {"Modelled PM2.5": values}, geometry=polygons, crs="EPSG:3347"
).to_crs("EPSG:4326")

fig = px.choropleth_mapbox(
    pm25_model,
    geojson=pm25_model.geometry,
    locations=pm25_model.index,
    color="Modelled PM2.5",
    color_continuous_scale="jet",
    center={"lat": 50.0, "lon": -110.0},
    zoom=2.5,
    mapbox_style="carto-positron",
    opacity=0.6,
)
fig.update_layout(margin=dict(l=0, r=0, t=30, b=10))
fig.update_traces(marker_line_width=0)